# ViT architecture

Ade20k Panoptic Eomt Large 640
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation PyTorch
A
tue-mps
105
0
Ade20k Panoptic Eomt Giant 640
MIT
This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture specifically for segmentation.
Image Segmentation
A
tue-mps
116
0
Vit Base Patch16 Clip 224.dfn2b
Other
Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple
Image Classification Transformers
V
timm
444
0
Llm Jp Clip Vit Base Patch16
Apache-2.0
Japanese CLIP model trained on OpenCLIP framework, supporting zero-shot image classification tasks
Text-to-Image Japanese
L
llm-jp
40
1
Vit Base Patch16 Clip 224.laion400m E31
MIT
Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
1,469
0
Vit Base Patch32 Clip 224.laion400m E32
MIT
Vision Transformer model trained on LAION-400M dataset, compatible with both OpenCLIP and timm frameworks
Image Classification
V
timm
5,957
0
Vit Facial Expression Recognition
A facial expression recognition model based on ViT architecture, fine-tuned on the imagefolder dataset with an accuracy of 91.77%
Face-related Transformers
V
Alpiyildo
581
1
Vit Base Violence Detection
Apache-2.0
A violence detection model optimized based on the Vision Transformer (ViT) architecture, capable of classifying images into violent or non-violent scenes.
Image Classification Transformers English
V
jaranohaal
2,140
6
Vit Facial Expression Recognition
A ViT-based facial expression recognition model fine-tuned on FER2013, MMI, and AffectNet datasets, capable of recognizing seven basic emotions
Face-related Transformers
V
motheecreator
4,221
13
AI VS REAL IMAGE DETECTION
Apache-2.0
An image classification model fine-tuned based on Google Vision Transformer (ViT) architecture, used to distinguish AI-generated images from real images
Image Classification Transformers
A
Hemg
259
2
Vit Base Nsfw Detector
Apache-2.0
An image classification model based on Vision Transformer (ViT) architecture, specifically designed to detect whether images contain NSFW (Not Safe For Work) content.
Image Classification Transformers
V
AdamCodd
1.2M
47
Vitforimageclassification
Apache-2.0
This model is a fine-tuned image classification model based on google/vit-base-patch16-224-in21k on the CIFAR10 dataset, achieving an accuracy of 96.78%.
Image Classification Transformers
V
Andron00e
43
2
Vit Finetuned Vanilla Cifar10 0
Apache-2.0
An image classification model fine-tuned on the CIFAR-10 dataset based on the Vision Transformer (ViT) architecture, achieving 99.2% accuracy
Image Classification Transformers
V
02shanky
68
1
Phikon
Other
Phikon is a self-supervised learning model for histopathology based on iBOT training, primarily used for extracting features from histology image patches.
Image Classification Transformers English
P
owkin
741.63k
30
Dinov2 Small
Apache-2.0
A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning
Image Classification Transformers
D
facebook
5.0M
31
Sam Vit Base
Apache-2.0
SAM is a vision model capable of generating high-quality object masks from input prompts (such as points or boxes), supporting zero-shot segmentation tasks
Image Segmentation Transformers Other
S
facebook
635.09k
137
Clasificacion Vit Model Manuel Chaves
Apache-2.0
An image classification model fine-tuned from google/vit-base-patch16-224-in21k, achieving 97.74% accuracy on the bean dataset
Image Classification Transformers
C
machves
15
0
Vit Base Railspace
Apache-2.0
A Vision Transformer model fine-tuned from google/vit-base-patch16-224-in21k, achieving 99.26% accuracy on the evaluation set
Image Classification Transformers
V
Kaspar
18
2
VIT Food101 Image Classifier
Food image classification model based on Vision Transformer architecture, trained on the Food101 dataset with an accuracy of 93.3%
Image Classification Transformers
V
StatsGary
41
0
Vit Base Patch16 224 In21k Lcbsi
Apache-2.0
A fine-tuned model based on Google Vision Transformer (ViT) architecture, suitable for image classification tasks
Image Classification Transformers
V
polejowska
33
0
Vit Base Patch16 224 In21k Ft Cifar10test
Apache-2.0
A visual classification model fine-tuned on the CIFAR-10 test set based on the Google Vision Transformer (ViT) model
Image Classification Transformers
V
minhhoque
29
0
Vit Base Patch16 224 Finetuned Cifar10
Apache-2.0
This is an image classification model based on the Vision Transformer (ViT) architecture, fine-tuned on the CIFAR10 dataset, achieving 98.76% accuracy.
Image Classification Transformers
V
Weili
15
0
Vit Base Patch32 224 In21 Leicester Binary
Apache-2.0
A binary image classification model based on the Google Vision Transformer (ViT) architecture, fine-tuned on a specific dataset to achieve high-precision classification
Image Classification Transformers
V
davanstrien
15
0
Vit Base Beans
Apache-2.0
A visual classification model fine-tuned on the bean image dataset based on Google's ViT base model, with an accuracy rate of 97.74%
Image Classification Transformers
V
christyli
31
0
Syn10kplusog Oct ViT Base 8Epochs V1
An image classification model based on the ViT architecture, achieving 88.67% accuracy after 8 epochs of training
Image Classification Transformers
S
g30rv17ys
13
0
Syn10k Oct ViT Base 8Epochs V1
An image classification model based on the ViT architecture, achieving 92.5% accuracy after 8 training epochs
Image Classification Transformers
S
g30rv17ys
13
0
Yolos Small Balloon
YOLOS is an object detection model using Vision Transformer (ViT) architecture, trained with DETR loss and fine-tuned on COCO and Matterport Balloon datasets.
Object Detection Transformers
Y
zoheb
101
1
Vit Base Patch16 224 In21k Finetuned Cassava3
Apache-2.0
An image classification model based on Google Vision Transformer (ViT) architecture, fine-tuned on an image folder dataset with an accuracy of 88.55%
Image Classification Transformers
V
siddharth963
13
1
Syn Oct ViT Base 4Epochs 30c V2 Run
An image classification model based on the ViT architecture, trained on an OCT image dataset with an accuracy of 86.67%
Image Classification Transformers
S
g30rv17ys
13
0
Vit Base Mnist
Apache-2.0
An image classification model fine-tuned on the MNIST dataset based on the ViT architecture, achieving an accuracy of 99.49%
Image Classification Transformers
V
farleyknight-org-username
1,770
8
Vit Base Patch16 224 In21k Finetuned Eurosat
Apache-2.0
An image classification model based on the ViT architecture, fine-tuned on the image_folder dataset with an accuracy of 90.17%
Image Classification Transformers
V
Chandanab
16
0
Vit Base Patch16 224 In21k Ucsat
Apache-2.0
Image classification model based on Vision Transformer architecture, fine-tuned on an unknown dataset
Image Classification Transformers
V
YKXBCi
31
0
Garbage Classification
This is a garbage classification model based on the Vision Transformer architecture, achieving 95% test accuracy on a 6-category garbage dataset.
Image Classification Transformers
G
yangy50
165
1
Ak Vit Base Patch16 224 In21k Image Classification
Apache-2.0
An image classification model based on Google's Vision Transformer (ViT) architecture, fine-tuned on a custom image dataset with an evaluation accuracy of 100%
Image Classification Transformers
A
amitkayal
19
0
Vit Base Patch16 224 In21k Eurosat
Apache-2.0
This model is a fine-tuned Vision Transformer based on google/vit-base-patch16-224-in21k on an unknown dataset, primarily used for image classification tasks.
Image Classification Transformers
V
YKXBCi
23
0
Violation Classification Bantai Vit V100ep
Apache-2.0
A ViT-based image classification model for prohibited content recognition, achieving 91.57% accuracy on the evaluation set
Image Classification Transformers
V
AykeeSalazar
32
0
Vit Base Patch16 224 In21k Finetuned Cifar10
Apache-2.0
A pre-trained model based on Google's Vision Transformer (ViT) architecture, fine-tuned on the CIFAR-10 dataset for image classification tasks.
Image Classification Transformers
V
aaraki
16.69k
10
Vision Transformer Fmri Classification Ft
fMRI image classification model based on Vision Transformer architecture, automatically generated by HuggingPics
Image Classification Transformers
V
shivkumarganesh
82
3
Vit Base Patch16 224 In21k Eurosat
Apache-2.0
A high-precision remote sensing image classification model fine-tuned on the EuroSAT dataset based on Google's Vision Transformer architecture
Image Classification Transformers
V
philschmid
28
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase